TeX 1995 July

home *** CD-ROM | disk | FTP | other *** search

/ TeX 1995 July / TeX CD-ROM July 1995 (Disc 1)(Walnut Creek)(1995).ISO / tex-k / tex-k-archive.past / tex-k-archive.gz / tex-k-archive / 000377_fj@iesd.auc.dk_Sat Mar 12 21:42:55 1994.msg < prev next >

Wrap

Internet Message Format | 1994-10-11 | 2KB

Received: from iesd.auc.dk by cs.umb.edu with SMTP id AA03464 (5.65c/IDA-1.4.4 for <tex-k@cs.umb.edu>); Sat, 12 Mar 1994 14:43:36 -0500 Received: from loke.iesd.auc.dk (loke.iesd.auc.dk [130.225.48.20]) by iesd.auc.dk (8.6.5/8.6.5) with ESMTP id UAA10630; Sat, 12 Mar 1994 20:43:33 +0100 From: Frank Jensen <fj@iesd.auc.dk> Received: from localhost (fj@localhost) by loke.iesd.auc.dk (8.6.4/8.6.4) id UAA24887; Sat, 12 Mar 1994 20:42:55 +0100 Date: Sat, 12 Mar 1994 20:42:55 +0100 Message-Id: <199403121942.UAA24887@loke.iesd.auc.dk> To: tex-k@cs.umb.edu Subject: Kpathsearch 1.7 -- comments on "hash.c" I have two comments. 1) The hash function is not particularly good. It computes the sum of all characters in the key, multiplies the result by two, and finally takes modulo the hash table size. [The multiplication by two is done in a strange way.] If we assume that an average key has length between 5 and 10, that the ASCII character set is being used, and that a typical character is a lower case letter (so that its numeric value is around 100), then the hash value of an average key will be between 1000 and 2000 (assuming the size of the hash table is at least 2000). The hash table created by the database search code has size 7603, meaning that we can expect the last 5000 buckets to be empty (and we can expect a clustering around bucket 1500). It will be better to choose a hash function that spreads out the keys in different buckets. Here is one from Knuth's CWEB system (rewritten in Kpathsearch style): static unsigned hash P2C(hash_table_type, table, const_string, key) { unsigned n = 0; while (*key != 0) { n *= 2; n += (unsigned char) *key++; } return n % table.size; } 2) In `hash_create', the buckets are created using `calloc'. According to the ANSI standard, `calloc' initializes all bits to zero, and it is explicitly mentioned that this is not necessarily the representation of a null pointer (or a floating-point zero). So I think it's best to explicitly initialize all buckets with a null pointer. --- Frank Jensen, fj@iesd.auc.dk Department of Mathematics and Computer Science Aalborg University DENMARK